Model Selection

Zero-shot Learning

# Zero-shot Learning

Magma-8B is an image-text-to-text conversion model based on the GGUF format, suitable for multimodal task processing.

Transformers is an open-source library developed by Hugging Face, providing various pretrained models for natural language processing tasks.

Large Language Model

Openvision Vit Small Patch16 224

OpenVision is a fully open, cost-effective family of advanced vision encoders focused on multimodal learning.

Image Enhancement

Bart Large Empathetic Dialogues

This model is based on the transformers library, and its specific purpose and functionality require further information to determine.

Large Language Model

Falcon H1 1.5B Deep Base

Falcon-H1 is an efficient hybrid architecture language model developed by TII, combining Transformer and Mamba architectures to support multilingual tasks

Large Language Model

Transformers Supports Multiple Languages

Openbioner Base

OpenBioNER is a lightweight BERT model specifically designed for open-domain biomedical named entity recognition (NER). It can identify unseen entity types using only natural language descriptions of target entity types, without requiring retraining.

Sequence Labeling English

XGLM-564M is a multilingual autoregressive language model with 564 million parameters, trained on a balanced corpus of 30 languages totaling 500 billion subwords.

Large Language Model Supports Multiple Languages

Zero Mistral 24B

Zero-Mistral-24B is an improved text-only model based on Mistral-Small-3.1-24B-Instruct-2503, primarily adapted for Russian and English, with the original visual capabilities removed to focus on text generation tasks.

Large Language Model

Transformers Supports Multiple Languages

This is a transformers model hosted on Hugging Face Hub. Specific functions and uses require further information.

Large Language Model

Xlm Roberta Large Pooled Cap Minor

A multilingual text classification model fine-tuned based on xlm-roberta-large, used for minor topic code classification in comparative agenda projects

Text Classification

This is an improved version of the Facebook SAM model (sam-vit-base), specifically optimized for image segmentation tasks in CVAT.

Image Segmentation Supports Multiple Languages

Quantum_STT is an advanced automatic speech recognition (ASR) and speech translation model, trained with large-scale weak supervision, supporting multiple languages and tasks.

Speech Recognition

Transformers Supports Multiple Languages

Kok-Base is a multilingual model supporting English, Arabic, and Czech, suitable for various natural language processing tasks.

Large Language Model

Transformers Supports Multiple Languages

Internvl2 5 HiMTok 8B

HiMTok is a hierarchical mask token learning framework fine-tuned on the InternVL2_5-8B large multimodal model, focusing on image segmentation tasks.

Distill Any Depth Small Hf

Distill-Any-Depth is a SOTA monocular depth estimation model trained based on knowledge distillation algorithms, capable of efficient and accurate depth estimation.

Illumiyume Anime Style Noobai Xl Nai Xl V10 Sdxl

An anime-style text-to-image generation model based on Stable Diffusion XL, focusing on high-quality anime character creation

Image Generation English

Allenai.olmocr 7B 0225 Preview GGUF

olmOCR-7B-0225-preview is an image-to-text model based on OCR technology developed by AllenAI, designed to extract and recognize text content from images.

Large Language Model

Llava NeXT Video 7B Hf

LLaVA-NeXT-Video-7B-hf is a video-based multimodal model capable of processing video and text inputs to generate text outputs.

Video-to-Text English

Qwen2.5 Dyanka 7B Preview

A 7B-parameter language model based on the Qwen2.5 architecture, created by fusing multiple pre-trained models using the TIES method

Large Language Model

Xiaojian9992024

Vit So400m Patch16 Siglip 384.v2 Webli

Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset

Vit So400m Patch14 Siglip 378.v2 Webli

Vision Transformer model based on SigLIP 2, designed for image feature extraction, trained on the webli dataset

Vit Large Patch16 Siglip Gap 384.v2 Webli

A vision Transformer model based on the SigLIP 2 architecture, featuring a Global Average Pooling (GAP) variant that removes the attention pooling head, suitable for image feature extraction tasks.

Vit Large Patch16 Siglip 512.v2 Webli

ViT image encoder based on SigLIP 2, designed for timm, suitable for vision-language tasks

Image Classification

Vit Giantopt Patch16 Siglip Gap 256.v2 Webli

SigLIP 2 ViT image encoder, using global average pooling, with attention pooling head removed, designed specifically for timm

Image Classification

Vit Giantopt Patch16 Siglip 256.v2 Webli

Vision Transformer model based on SigLIP 2 technology, focused on image feature extraction

Vit Base Patch32 Siglip 256.v2 Webli

Vision Transformer model based on SigLIP 2 architecture, designed for image feature extraction

Vit Base Patch16 Siglip Gap 512.v2 Webli

A ViT image encoder based on SigLIP 2, using global average pooling with the attention pooling head removed, suitable for image feature extraction tasks.

Image Classification

Vit Base Patch16 Siglip 512.v2 Webli

Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset

Vit So400m Patch16 Siglip Gap 512.v2 Webli

A ViT image encoder based on SigLIP 2, utilizing global average pooling, suitable for vision-language tasks.

Qwen2.5 14B CIC ACLARC GGUF

This is a quantized version based on the Qwen2.5-14B-Instruct model, specifically designed for citation intent classification tasks.

Large Language Model English

Qwen2.5 14B CIC SciCite GGUF

A citation intent classification model fine-tuned based on Qwen2.5-14B-Instruct, specializing in citation analysis tasks in scientific literature.

Large Language Model English

Gliner Biomed Bi Large V1.0

GLiNER-BioMed is an efficient open NER model suite based on the GLiNER framework, specifically designed for the biomedical domain to recognize various types of biomedical entities.

Sequence Labeling English

Gliner Biomed Bi Base V1.0

GLiNER-BioMed is an efficient open biomedical named entity recognition model suite based on the GLiNER framework, specifically designed for the biomedical domain, capable of recognizing multiple entity types.

Sequence Labeling English

HealthGPT is a model specifically developed for unified multimodal medical tasks.

Large Language Model English

Cuckoo is a small (300M parameters) information extraction model that efficiently extracts information by mimicking the next-word prediction paradigm of large language models

Large Language Model

ENEL is a model exploring the potential of encoder-free architecture in 3D large multimodal models.

Pi0 is a general robot control model based on vision-language-action flow, supporting robot control tasks.

Multimodal Fusion

Felguk Suno Or People

This model is used to classify audio clips as either 'Suno' music or 'People' music.

Audio Classification

Transformers Supports Multiple Languages

Internlm3 8b Instruct

InternLM3-8B-Instruct is an 8-billion-parameter instruction model developed by Shanghai AI Laboratory, designed for general-purpose use and advanced reasoning, featuring high efficiency and low cost.

Large Language Model

Sam2 Hiera Small.fb R896

SAM2 model based on the HieraDet image encoder, focused on image feature extraction tasks.

Image Segmentation

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase